Pesquisa | Portal Regional da BVS

1.

Compositional Structure of the Genome: A Review.

Bernaola-Galván, Pedro; Carpena, Pedro; Gómez-Martín, Cristina; Oliver, Jose L.

Biology (Basel) ; 12(6)2023 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-37372134

RESUMO

As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.

2.

Decrease of heart rate variability during exercise: An index of cardiorespiratory fitness.

Mongin, Denis; Chabert, Clovis; Extremera, Manuel Gomez; Hue, Olivier; Courvoisier, Delphine Sophie; Carpena, Pedro; Galvan, Pedro Angel Bernaola.

PLoS One ; 17(9): e0273981, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36054204

RESUMO

The present study proposes to measure and quantify the heart rate variability (HRV) changes during effort as a function of the heart rate and to test the capacity of the produced indices to predict cardiorespiratory fitness measures. Therefore, the beat-to-beat cardiac time interval series of 18 adolescent athletes (15.2 ± 2.0 years) measured during maximal graded effort test were detrended using a dynamical first-order differential equation model. HRV was then calculated as the standard deviation of the detrended RR intervals (SDRR) within successive windows of one minute. The variation of this measure of HRV during exercise is properly fitted by an exponential decrease of the heart rate: the SDRR is divided by 2 every increase of heart rate of 20 beats/min. The HR increase necessary to divide by 2 the HRV is linearly inversely correlated with the maximum oxygen consumption (r = -0.60, p = 0.006), the maximal aerobic power (r = -0.62, p = 0.006), and, to a lesser extent, to the power at the ventilatory thresholds (r = -0.53, p = 0.02 and r = -0.47, p = 0.05 for the first and second threshold). It indicates that the decrease of the HRV when the heart rate increases is faster among athletes with better fitness. This analysis, based only on cardiac measurements, provides a promising tool for the study of cardiac measurements generated by portable devices.

Assuntos

Aptidão Cardiorrespiratória , Adolescente , Exercício Físico/fisiologia , Teste de Esforço , Frequência Cardíaca/fisiologia , Humanos , Consumo de Oxigênio/fisiologia

3.

On the Validity of Detrended Fluctuation Analysis at Short Scales.

Carpena, Pedro; Gómez-Extremera, Manuel; Bernaola-Galván, Pedro A.

Entropy (Basel) ; 24(1)2021 Dec 29.

Artigo em Inglês | MEDLINE | ID: mdl-35052087

RESUMO

Detrended Fluctuation Analysis (DFA) has become a standard method to quantify the correlations and scaling properties of real-world complex time series. For a given scale â of observation, DFA provides the function F(â), which quantifies the fluctuations of the time series around the local trend, which is substracted (detrended). If the time series exhibits scaling properties, then F(â)â¼âα asymptotically, and the scaling exponent α is typically estimated as the slope of a linear fitting in the logF(â) vs. log(â) plot. In this way, α measures the strength of the correlations and characterizes the underlying dynamical system. However, in many cases, and especially in a physiological time series, the scaling behavior is different at short and long scales, resulting in logF(â) vs. log(â) plots with two different slopes, α1 at short scales and α2 at large scales of observation. These two exponents are usually associated with the existence of different mechanisms that work at distinct time scales acting on the underlying dynamical system. Here, however, and since the power-law behavior of F(â) is asymptotic, we question the use of α1 to characterize the correlations at short scales. To this end, we show first that, even for artificial time series with perfect scaling, i.e., with a single exponent α valid for all scales, DFA provides an α1 value that systematically overestimates the true exponent α. In addition, second, when artificial time series with two different scaling exponents at short and large scales are considered, the α1 value provided by DFA not only can severely underestimate or overestimate the true short-scale exponent, but also depends on the value of the large scale exponent. This behavior should prevent the use of α1 to describe the scaling properties at short scales: if DFA is used in two time series with the same scaling behavior at short scales but very different scaling properties at large scales, very different values of α1 will be obtained, although the short scale properties are identical. These artifacts may lead to wrong interpretations when analyzing real-world time series: on the one hand, for time series with truly perfect scaling, the spurious value of α1 could lead to wrongly thinking that there exists some specific mechanism acting only at short time scales in the dynamical system. On the other hand, for time series with true different scaling at short and large scales, the incorrect α1 value would not characterize properly the short scale behavior of the dynamical system.

4.

Transforming Gaussian correlations. Applications to generating long-range power-law correlated time series with arbitrary distribution.

Carpena, Pedro; Bernaola-Galván, Pedro A; Gómez-Extremera, Manuel; Coronado, Ana V.

Chaos ; 30(8): 083140, 2020 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-32872793

RESUMO

The observable outputs of many complex dynamical systems consist of time series exhibiting autocorrelation functions of great diversity of behaviors, including long-range power-law autocorrelation functions, as a signature of interactions operating at many temporal or spatial scales. Often, numerical algorithms able to generate correlated noises reproducing the properties of real time series are used to study and characterize such systems. Typically, many of those algorithms produce a Gaussian time series. However, the real, experimentally observed time series are often non-Gaussian and may follow distributions with a diversity of behaviors concerning the support, the symmetry, or the tail properties. It is always possible to transform a correlated Gaussian time series into a time series with a different marginal distribution, but the question is how this transformation affects the behavior of the autocorrelation function. Here, we study analytically and numerically how the Pearson's correlation of two Gaussian variables changes when the variables are transformed to follow a different destination distribution. Specifically, we consider bounded and unbounded distributions, symmetric and non-symmetric distributions, and distributions with different tail properties from decays faster than exponential to heavy-tail cases including power laws, and we find how these properties affect the correlation of the final variables. We extend these results to a Gaussian time series, which are transformed to have a different marginal distribution, and show how the autocorrelation function of the final non-Gaussian time series depends on the Gaussian correlations and on the final marginal distribution. As an application of our results, we propose how to generalize standard algorithms producing a Gaussian power-law correlated time series in order to create a synthetic time series with an arbitrary distribution and controlled power-law correlations. Finally, we show a practical example of this algorithm by generating time series mimicking the marginal distribution and the power-law tail of the autocorrelation function of real time series: the absolute returns of stock prices.

5.

Connection of the nearest-neighbor spacing distribution and the local box-counting dimension for discrete sets.

Carpena, Pedro; Coronado, Ana V.

Phys Rev E ; 100(2-1): 022205, 2019 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-31574656

RESUMO

In a recent work [Phys. Rev. E 97, 030202(R) (2018)10.1103/PhysRevE.97.030202], Sakhr and Nieminen (SN) solved a hypothesis formulated two decades ago, according to which the local box-counting dimension D_{box}(r) of a given energy spectrum, or more generally of a discrete set, should exclusively depend on the nearest-neighbor spacing distribution P(s) of the spectrum (set). SN found analytically this dependence, which led them to obtain closed formulas for the local box-counting dimension of Poisson spectra and of spectra belonging to Gaussian orthogonal, unitary, and symplectic ensembles. Here, first, we present a different derivation of the equation establishing the connection of D_{box}(r) and P(s) using the concept of surrogate spectrum. Although our equation is formally different to the SN result, we prove that both are equivalent. Second, we apply our equation to solve the inverse problem of determining the functional form of P(s) for spectra with real fractal structure and constant box-counting dimension D_{box}, and we find that P(s) should behave as a power-law of the spacing, with an exponent given by -(1+D_{box}). Finally, we present four applications or consequences of this last result: First, we provide a simple algorithm able to generate random fractal spectra with prescribed and constant D_{box}. Second, we calculate D_{box} for the sets given by the zeros of fractional Brownian motions, whose P(s) is known to have a power-law tail. Third, we also study D_{box}(r) for the zeros of fractional Gaussian noises, whose P(s) in known to present fat (but not power-law) tails, and that could be misinterpreted as real fractals. And finally, we present the calculation of D_{box} for the spectra of Fibonacci Hamiltonians, known to have fractal properties, simply by fitting their corresponding P(s) to a power-law without the need of applying a box-counting algorithm.

6.

Comparison of methods for the assessment of nonlinearity in short-term heart rate variability under different physiopathological states.

Faes, Luca; Gómez-Extremera, Manuel; Pernice, Riccardo; Carpena, Pedro; Nollo, Giandomenico; Porta, Alberto; Bernaola-Galván, Pedro.

Chaos ; 29(12): 123114, 2019 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-31893647

RESUMO

Despite the widespread diffusion of nonlinear methods for heart rate variability (HRV) analysis, the presence and the extent to which nonlinear dynamics contribute to short-term HRV are still controversial. This work aims at testing the hypothesis that different types of nonlinearity can be observed in HRV depending on the method adopted and on the physiopathological state. Two entropy-based measures of time series complexity (normalized complexity index, NCI) and regularity (information storage, IS), and a measure quantifying deviations from linear correlations in a time series (Gaussian linear contrast, GLC), are applied to short HRV recordings obtained in young (Y) and old (O) healthy subjects and in myocardial infarction (MI) patients monitored in the resting supine position and in the upright position reached through head-up tilt. The method of surrogate data is employed to detect the presence and quantify the contribution of nonlinear dynamics to HRV. We find that the three measures differ both in their variations across groups and conditions and in the percentage and strength of nonlinear HRV dynamics. NCI and IS displayed opposite variations, suggesting more complex dynamics in O and MI compared to Y and less complex dynamics during tilt. The strength of nonlinear dynamics is reduced by tilt using all measures in Y, while only GLC detects a significant strengthening of such dynamics in MI. A large percentage of detected nonlinear dynamics is revealed only by the IS measure in the Y group at rest, with a decrease in O and MI and during T, while NCI and GLC detect lower percentages in all groups and conditions. While these results suggest that distinct dynamic structures may lie beneath short-term HRV in different physiological states and pathological conditions, the strong dependence on the measure adopted and on their implementation suggests that physiological interpretations should be provided with caution.

Assuntos

Frequência Cardíaca/fisiologia , Dinâmica não Linear , Adulto , Entropia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de Tempo

7.

Differences in nonlinear heart dynamics during rest and exercise and for different training.

Gómez-Extremera, Manuel; Bernaola-Galván, Pedro A; Vargas, Salvador; Benítez-Porres, Javier; Carpena, Pedro; Romance, A Ramón.

Physiol Meas ; 39(8): 084008, 2018 08 31.

Artigo em Inglês | MEDLINE | ID: mdl-30091423

RESUMO

OBJECTIVE: In this work we want to analyze differences in nonlinear properties between rest and exercise and also to study the permanent effects of physical exercise on heart rate dynamics. APPROACH: It has been shown that physical exercise alters heart dynamics by increasing heart rate and decreasing variability, modifying spectral power and linear correlations, etc. We hypothesize that physical exercise should also reduce nonlinearity in the heartbeat time series. To quantify nonlinearity in the heartbeat time series, we use an index of nonlinearity recently proposed by Bernaola et al based on correlations of the magnitude time series. MAIN RESULTS: Our results confirm our initial hypothesis of loss of nonlinearity during physical exercise. Moreover, regarding the permanent effects of physical exercise on heart rate dynamics, we also obtain that aerobic physical training tends to increase nonlinearity in heart dynamics during rest. SIGNIFICANCE: It is well-known that heart dynamics are controlled by complex interactions between the sympathetic and parasympathetic branches of the autonomic nervous system. Moreover, these two branches act in a competing way, resulting in a clear parasympathetic withdrawal and sympathetic activation during physical exercise. We associate these interactions during physical exercise with a drastic loss of nonlinear properties in the heartbeat time series, revealing the importance of nonlinearity measures in the study of complex systems.

Assuntos

Exercício Físico/fisiologia , Coração/fisiologia , Dinâmica não Linear , Descanso/fisiologia , Adulto , Frequência Cardíaca , Humanos , Masculino

8.

NGSmethDB 2017: enhanced methylomes and differential methylation.

Lebrón, Ricardo; Gómez-Martín, Cristina; Carpena, Pedro; Bernaola-Galván, Pedro; Barturen, Guillermo; Hackenberg, Michael; Oliver, José L.

Nucleic Acids Res ; 45(D1): D97-D103, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27794041

RESUMO

The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB.

Assuntos

Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Animais , Citosina/metabolismo , Genoma , Humanos

9.

Correlations in magnitude series to assess nonlinearities: Application to multifractal models and heartbeat fluctuations.

Bernaola-Galván, Pedro A; Gómez-Extremera, Manuel; Romance, A Ramón; Carpena, Pedro.

Phys Rev E ; 96(3-1): 032218, 2017 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-29347013

RESUMO

The correlation properties of the magnitude of a time series are associated with nonlinear and multifractal properties and have been applied in a great variety of fields. Here we have obtained the analytical expression of the autocorrelation of the magnitude series (C_{|x|}) of a linear Gaussian noise as a function of its autocorrelation (C_{x}). For both, models and natural signals, the deviation of C_{|x|} from its expectation in linear Gaussian noises can be used as an index of nonlinearity that can be applied to relatively short records and does not require the presence of scaling in the time series under study. In a model of artificial Gaussian multifractal signal we use this approach to analyze the relation between nonlinearity and multifractallity and show that the former implies the latter but the reverse is not true. We also apply this approach to analyze experimental data: heart-beat records during rest and moderate exercise. For each individual subject, we observe higher nonlinearities during rest. This behavior is also achieved on average for the analyzed set of 10 semiprofessional soccer players. This result agrees with the fact that other measures of complexity are dramatically reduced during exercise and can shed light on its relationship with the withdrawal of parasympathetic tone and/or the activation of sympathetic activity during physical activity.

Assuntos

Fractais , Modelos Teóricos , Dinâmica não Linear , Atletas , Frequência Cardíaca , Humanos , Masculino , Descanso/fisiologia , Corrida/fisiologia , Futebol , Fatores de Tempo , Adulto Jovem

10.

Probability distribution of intersymbol distances in random symbolic sequences: Applications to improving detection of keywords in texts and of amino acid clustering in proteins.

Carpena, Pedro; Bernaola-Galván, Pedro A; Carretero-Campos, Concepción; Coronado, Ana V.

Phys Rev E ; 94(5-1): 052302, 2016 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27967154

RESUMO

Symbolic sequences have been extensively investigated in the past few years within the framework of statistical physics. Paradigmatic examples of such sequences are written texts, and deoxyribonucleic acid (DNA) and protein sequences. In these examples, the spatial distribution of a given symbol (a word, a DNA motif, an amino acid) is a key property usually related to the symbol importance in the sequence: The more uneven and far from random the symbol distribution, the higher the relevance of the symbol to the sequence. Thus, many techniques of analysis measure in some way the deviation of the symbol spatial distribution with respect to the random expectation. The problem is then to know the spatial distribution corresponding to randomness, which is typically considered to be either the geometric or the exponential distribution. However, these distributions are only valid for very large symbolic sequences and for many occurrences of the analyzed symbol. Here, we obtain analytically the exact, randomly expected spatial distribution valid for any sequence length and any symbol frequency, and we study its main properties. The knowledge of the distribution allows us to define a measure able to properly quantify the deviation from randomness of the symbol distribution, especially for short sequences and low symbol frequency. We apply the measure to the problem of keyword detection in written texts and to study amino acid clustering in protein sequences. In texts, we show how the results improve with respect to previous methods when short texts are analyzed. In proteins, which are typically short, we show how the measure quantifies unambiguously the amino acid clustering and characterize its spatial distribution.

Assuntos

Aminoácidos/química , Biologia Computacional/métodos , Modelos Teóricos , Probabilidade , Algoritmos , Sequência de Aminoácidos , Análise por Conglomerados , Periodicidade , Proteínas/química , Análise de Sequência

11.

Magnitude and sign of long-range correlated time series: Decomposition and surrogate signal generation.

Gómez-Extremera, Manuel; Carpena, Pedro; Ivanov, Plamen Ch; Bernaola-Galván, Pedro A.

Phys Rev E ; 93: 042201, 2016 04.

Artigo em Inglês | MEDLINE | ID: mdl-27176287

RESUMO

We systematically study the scaling properties of the magnitude and sign of the fluctuations in correlated time series, which is a simple and useful approach to distinguish between systems with different dynamical properties but the same linear correlations. First, we decompose artificial long-range power-law linearly correlated time series into magnitude and sign series derived from the consecutive increments in the original series, and we study their correlation properties. We find analytical expressions for the correlation exponent of the sign series as a function of the exponent of the original series. Such expressions are necessary for modeling surrogate time series with desired scaling properties. Next, we study linear and nonlinear correlation properties of series composed as products of independent magnitude and sign series. These surrogate series can be considered as a zero-order approximation to the analysis of the coupling of magnitude and sign in real data, a problem still open in many fields. We find analytical results for the scaling behavior of the composed series as a function of the correlation exponents of the magnitude and sign series used in the composition, and we determine the ranges of magnitude and sign correlation exponents leading to either single scaling or to crossover behaviors. Finally, we obtain how the linear and nonlinear properties of the composed series depend on the correlation exponents of their magnitude and sign series. Based on this information we propose a method to generate surrogate series with controlled correlation exponent and multifractal spectrum.

Assuntos

Modelos Lineares , Dinâmica não Linear , Algoritmos , Análise de Fourier , Fractais , Fatores de Tempo

12.

Phase transitions in the first-passage time of scale-invariant correlated processes.

Carretero-Campos, Concepción; Bernaola-Galván, Pedro; Ivanov, Plamen Ch; Carpena, Pedro.

Phys Rev E Stat Nonlin Soft Matter Phys ; 85(1 Pt 1): 011139, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22400544

RESUMO

A key quantity describing the dynamics of complex systems is the first-passage time (FPT). The statistical properties of FPT depend on the specifics of the underlying system dynamics. We present a unified approach to account for the diversity of statistical behaviors of FPT observed in real-world systems. We find three distinct regimes, separated by two transition points, with fundamentally different behavior for FPT as a function of increasing strength of the correlations in the system dynamics: stretched exponential, power-law, and saturation regimes. In the saturation regime, the average length of FPT diverges proportionally to the system size, with important implications for understanding electronic delocalization in one-dimensional correlated-disordered systems.

Assuntos

Modelos Químicos , Modelos Moleculares , Modelos Estatísticos , Transição de Fase , Simulação por Computador , Estatística como Assunto

13.

Clustering of DNA words and biological function: a proof of principle.

Hackenberg, Michael; Rueda, Antonio; Carpena, Pedro; Bernaola-Galván, Pedro; Barturen, Guillermo; Oliver, José L.

J Theor Biol ; 297: 127-36, 2012 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-22226985

RESUMO

Relevant words in literary texts (key words) are known to be clustered, while common words are randomly distributed. Given the clustered distribution of many functional genome elements, we hypothesize that the biological text per excellence, the DNA sequence, might behave in the same way: k-length words (k-mers) with a clear function may be spatially clustered along the one-dimensional chromosome sequence, while less-important, non-functional words may be randomly distributed. To explore this linguistic analogy, we calculate a clustering coefficient for each k-mer (k=2-9bp) in human and mouse chromosome sequences, then checking if clustered words are enriched in the functional part of the genome. First, we found a positive general trend relating clustering level and word enrichment within exons and Transcription Factor Binding Sites (TFBSs), while a much weaker relation exists for repeats, and no relation at all exists for introns. Second, we found that 38.45% of the 200 top-clustered 8-mers, but only 7.70% of the non-clustered words, are represented in known motif databases. Third, enrichment/depletion experiments show that highly clustered words are significantly enriched in exons and TFBSs, while they are depleted in introns and repetitive DNA. Considering exons and TFBSs together, 1417 (or 72.26%) in human and 1385 (or 72.97%) in mouse of the top-clustered 8-mers showed a statistically significant association to either exons or TFBSs, thus strongly supporting the link between word clustering and biological function. Lastly, we identified a subset of clustered, diagnostic words that are enriched in exons but depleted in introns, and therefore might help to discriminate between these two gene regions. The clustering of DNA words thus appears as a novel principle to detect functionality in genome sequences. As evolutionary conservation is not a prerequisite, the proof of principle described here may open new ways to detect species-specific functional DNA sequences and the improvement of gene and promoter predictions, thus contributing to the quest for function in the genome.

Assuntos

DNA/genética , Modelos Genéticos , Algoritmos , Animais , Sequência de Bases , Sítios de Ligação/genética , Análise por Conglomerados , Éxons/genética , Humanos , Íntrons/genética , Linguística , Camundongos , Especificidade da Espécie , Fatores de Transcrição/genética

14.

WordCluster: detecting clusters of DNA words and genomic elements.

Hackenberg, Michael; Carpena, Pedro; Bernaola-Galván, Pedro; Barturen, Guillermo; Alganza, Angel M; Oliver, José L.

Algorithms Mol Biol ; 6: 2, 2011 Jan 24.

Artigo em Inglês | MEDLINE | ID: mdl-21261981

RESUMO

BACKGROUND: Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. RESULTS: We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. CONCLUSIONS: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.

15.

Prediction of CpG-island function: CpG clustering vs. sliding-window methods.

Hackenberg, Michael; Barturen, Guillermo; Carpena, Pedro; Luque-Escamilla, Pedro L; Previti, Christopher; Oliver, José L.

BMC Genomics ; 11: 327, 2010 May 26.

Artigo em Inglês | MEDLINE | ID: mdl-20500903

RESUMO

BACKGROUND: Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. RESULTS: We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. CONCLUSIONS: The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands.

Assuntos

Algoritmos , Ilhas de CpG , Elementos Alu/genética , Análise por Conglomerados , Sequência Conservada/genética , Metilação de DNA/genética , Evolução Molecular , Humanos , Regiões Promotoras Genéticas/genética

16.

Phylogenetic distribution of large-scale genome patchiness.

Oliver, José L; Bernaola-Galván, Pedro; Hackenberg, Michael; Carpena, Pedro.

BMC Evol Biol ; 8: 107, 2008 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-18405379

RESUMO

BACKGROUND: The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. RESULTS: The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. CONCLUSION: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.

Assuntos

Genoma/genética , Isocoros/genética , Filogenia , Animais , Arabidopsis/genética , Biologia Computacional , Cães , Genoma Fúngico/genética , Genoma Humano/genética , Genoma de Planta/genética , Humanos , Camundongos , Pan troglodytes/genética , Ratos , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA , Especificidade da Espécie

17.

CpGcluster: a distance-based algorithm for CpG-island detection.

Hackenberg, Michael; Previti, Christopher; Luque-Escamilla, Pedro Luis; Carpena, Pedro; Martínez-Aroza, José; Oliver, José L.

BMC Bioinformatics ; 7: 446, 2006 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-17038168

RESUMO

BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.

Assuntos

Algoritmos , Ilhas de CpG/genética , Animais , Genoma/genética , Humanos , Camundongos

18.

Specific heat of random fractal energy spectra.

Coronado, Ana V; Carpena, Pedro.

Phys Rev E Stat Nonlin Soft Matter Phys ; 73(1 Pt 2): 016124, 2006 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-16486233

RESUMO

The specific heat corresponding to systems with deterministic fractal energy spectra is known to present logarithmic-periodic oscillations as a function of the temperature T in the low T region around a mean value given by a characteristic dimension of the energy spectrum. In general, it is considered that the presence of disorder does not affect strongly these results, and that the fractal structure of the energy spectrum dominates. In this paper, we study the properties of the specific heat derived from random fractal energy spectra as a function of the degree of disorder present in the spectra. To study the influence of the disorder, we analyze the specific heat using three different properties: the specific heat mean value and the periods and amplitudes of the oscillations of the specific heat around its mean value. By studying the distributions and the mean values of these three properties, we obtain that the disorder does not influence very much the mean value of the specific heat. However, concerning the behavior of periods and amplitudes, we obtain a critical value of the disorder present in the energy spectra. Below this critical value, we find a low effect of the disorder and quasideterministic behavior indicating that the fractal structure is the dominant effect, but above the critical value, the disorder dominates and the behavior of the specific heat is practically chaotic.

19.

The biased distribution of Alus in human isochores might be driven by recombination.

Hackenberg, Michael; Bernaola-Galván, Pedro; Carpena, Pedro; Oliver, José L.

J Mol Evol ; 60(3): 365-77, 2005 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-15871047

RESUMO

Alu retrotransposons do not show a homogeneous distribution over the human genome but have a higher density in GC-rich (H) than in AT-rich (L) isochores. However, since they preferentially insert into the L isochores, the question arises: What is the evolutionary mechanism that shifts the Alu density maximum from L to H isochores? To disclose the role played by each of the potential mechanisms involved in such biased distribution, we carried out a genome-wide analysis of the density of the Alus as a function of their evolutionary age, isochore membership, and intron vs. intergene location. Since Alus depend on the retrotransposase encoded by the LINE1 elements, we also studied the distribution of LINE1 to provide a complete evolutionary scenario. We consecutively check, and discard, the contributions of the Alu/LINE1 competition for retrotransposase, compositional matching pressure, and Alu overrepresentation in introns. In analyzing the role played by unequal recombination, we scan the genome for Alu trimers, a direct product of Alu-Alu recombination. Through computer simulations, we show that such trimers are much more frequent than expected, the observed/expected ratio being higher in L than in H isochores. This result, together with the known higher selective disadvantage of recombination products in H isochores, points to Alu-Alu recombination as the main agent provoking the density shift of Alus toward the GC-rich parts of the genome. Two independent pieces of evidence-the lower evolutionary divergence shown by recently inserted Alu subfamilies and the higher frequency of old stand-alone Alus in L isochores-support such a conclusion. Other evolutionary factors, such as population bottlenecks during primate speciation, may have accelerated the fast accumulation of Alus in GC-rich isochores.

Assuntos

Elementos Alu/genética , Evolução Molecular , Isocoros/genética , Modelos Genéticos , Recombinação Genética/genética , Algoritmos , Composição de Bases/genética , Biologia Computacional , Simulação por Computador , Genômica/métodos , Humanos , Elementos Nucleotídeos Longos e Dispersos/genética

20.

Effect of nonlinear filters on detrended fluctuation analysis.

Chen, Zhi; Hu, Kun; Carpena, Pedro; Bernaola-Galvan, Pedro; Stanley, H Eugene; Ivanov, Plamen Ch.

Phys Rev E Stat Nonlin Soft Matter Phys ; 71(1 Pt 1): 011104, 2005 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-15697577

RESUMO

When investigating the dynamical properties of complex multiple-component physical and physiological systems, it is often the case that the measurable system's output does not directly represent the quantity we want to probe in order to understand the underlying mechanisms. Instead, the output signal is often a linear or nonlinear function of the quantity of interest. Here, we investigate how various linear and nonlinear transformations affect the correlation and scaling properties of a signal, using the detrended fluctuation analysis (DFA) which has been shown to accurately quantify power-law correlations in nonstationary signals. Specifically, we study the effect of three types of transforms: (i) linear ( y(i) =a x(i) +b) , (ii) nonlinear polynomial ( y(i) =a x(k)(i) ) , and (iii) nonlinear logarithmic [ y(i) =log ( x(i) +Delta) ] filters. We compare the correlation and scaling properties of signals before and after the transform. We find that linear filters do not change the correlation properties, while the effect of nonlinear polynomial and logarithmic filters strongly depends on (a) the strength of correlations in the original signal, (b) the power k of the polynomial filter, and (c) the offset Delta in the logarithmic filter. We further apply the DFA method to investigate the "apparent" scaling of three analytic functions: (i) exponential [exp (+/-x+a) ] , (ii) logarithmic [log (x+a) ] , and (iii) power law [ (x+a)(lambda) ] , which are often encountered as trends in physical and biological processes. While these three functions have different characteristics, we find that there is a broad range of values for parameter a common for all three functions, where the slope of the DFA curves is identical. We further note that the DFA results obtained for a class of other analytic functions can be reduced to these three typical cases. We systematically test the performance of the DFA method when estimating long-range power-law correlations in the output signals for different parameter values in the three types of filters and the three analytic functions we consider.

Assuntos

Algoritmos , Modelos Biológicos , Modelos Estatísticos , Dinâmica não Linear , Animais , Simulação por Computador , Humanos , Estatística como Assunto

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA